The essence of bootstrapping is that we don't know what the distribution of stock returns is, so we simply use historic stock returns as our sample space
This spreadsheet gives FTSE100 data
Use it to develop a Monte carlo method which prices an option on the FTSE100
Can you think of a problem with this method
Using day on day return will be very slow
Using day on day returns means we will not get any excess kurtosis as the central limit theorem will apply. In other words we are ignoring any conditional Heteroscedasticity
Can you think of solutions
You could use a longer time period than a day, although this would naturally reduce the sample points
You could use variable lengths of time for each data point
You could use different lengths of time for the sample than you were using in the MC simulation and then scale them
You could make your random selection of bootstrapped data point dependent on where the previous one was chosen from, for example picking a date from + or - 10 days either side the previous data point used and effectively use a random walk around your historic data.
We are going to assess the log returns of the FTSE100 data over 1 year periods.
We only have 34 data points but we can still measure the skew and kurtosis of the returns from this amount of data.
We need the following formulas
Sample Excess Kurtosis $=\frac{\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^4}{\left(\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^2\right)^2}-3$
Sample Skewness $=\frac{\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^3}{\left[\frac{1}{n} \sum_{i=1}^n(x_i-\bar{x})^2\right]^{3/2}}$
Note: Ideally we would use the unbiased estimators at this point but given the reasonably large sample and the intrinsic lack of accuracy in this process the pure sample statistics will probably suffice
Exercise: Write VBA functions to calculate the mean standard deviation, skewness and kurtosis of a data set
This time instead of taking random days from the past we start with one random day from the past and then randomly choose an adjacent day
We continue this process taking the price moves on successive adjacent days either going forward of backwards
By running this simulation thousand of times you can measure the skew and kurtosis of the distribution so generated
This time we will use randomly chosen days in the past but then take the return over a one month period.
This model will run quicker as as we onlty need 12 return values per one year simulation
Can you see a problem with this method
Yes - there are now only 12 times 34 independent data points to choose from
Can you also change your model so that you use whole year segments of the share price return
This time we randomly choose days in the past and then ratio up the return from that day by an appropriate amount to simulate the return for a whole year
We could also do this for 12 separate months or even take a monthly return and ratio up by $\sqrt{12}$
Each time we perform the bootstrapping in a different way we are trying to recreate the features of the actual share price return distribution.
For each method there are a number of different parameters we can change and it makes sense to try out different methods until we have a realistic simulation of the actual share we are modelling (in our case the whole FTSE100
First we need to consider how policies written, accidents happening, being reported and claims being paid out are used to fill values into the various different run off triangles that we can use.
Once we have data in a triangle the mathematics of calculating the reserve is the same whatever the triangle represents
First we need to consider a timeline
A policy is sold (written in 2012)
The premium is then earned continuously over the next 12 months
An accident happens in 2013 during the term of the policy
In 2015 this accident is reported to the insurance company and the benefit is believed to be £450
In 2016 this claim is settled for £600
So where do these numbers go in the different triangles
First we consider IBNR by accident year
The bold line represent the current time at year end 2016
As the accident happened in 2013 and was reported in 2015 we can see this represents development year 3 for accident year 2013
What if we do an underwriting year triangle. Then we wish to consider when policies were written in respect of which accidents happen
This policy was written in 2012 so it is reported in development year 4
What about reported but not settled - this time we group accidents by the year in which they were reported
So this accident goes in reported year 2015 and is settled in development year 2
What about the paid triangle - this considers when claims were actually paid out
If this is grouped by accident year then this claim was paid in 2016 which is development year 4 for accident year 2013
But we can also do a paid triangle by underwriting year
This time the policy was written in 2012 for which the claim is finally paid out in 2016 that is year 5
There are a number of different spreadhseet you can look at to back up the calculation in this section of the course:
Classic triangulation methods Basic Reserving Calculations. This spreadsheet contains 6 years of data to illustrate the methods more clearly.
Simplified 4 year spreadsheet (suitable for hand calcs in lecture) Basic reserving (4 year).xls
The chain ladder requires us to follow the steps below
Gather our data into a run-off triangle for whatever kind of reserve we are trying to calculate
Incremental | Development | |||
---|---|---|---|---|
Accident year | 1 | 2 | 3 | 4 |
2013 | 50 | 30 | 15 | 5 |
2014 | 60 | 40 | 25 | - |
2015 | 40 | 30 | - | - |
2016 | 80 | - | - | - |
Then we sum along the rows to cumulate the data
Cumulative | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2013 | 50 | 80 | 95 | 100 |
2014 | 60 | 100 | 125 | |
2015 | 40 | 70 | ||
2016 | 80 |
The blank cells represent the future - that we do not yet know. The purpose of this process is to try and make as good an estimate as possible as to what is going to happen in the future
Can you guess a figure you might put in cell(2014,4)
Guess = $125 \times \frac{100}{95} = 132$
What about cell(2015,4)
We might be tempted to choose $70 \times \frac{100}{80}$, but this not be a good guess because we have not used the accident year 2014 data that we have.
Cell(2015,3) is more intuitive. This time we have two years of data which has been developed for 3 years so we can use both of these years to guess this cell.
Guess = $70 \times \frac{95+125}{80+100} = 86$
We can now see that:
the ratio of development year 4 to development year 3 is just $\frac{100}{95}$ and
the ratio of development year 3 to development year 2 is just $\frac{95+125}{80+100}$
These numbers are called the development factors and once we have calculated them for each development year we can use them to fill in the whole triangle
The following table sets out the calculation as you will often see in a spreadsheet as a convenient way of organising the data is to sum each column and then take the last value of when calculating the following year's development factor
Sum of column | 230 | 250 | 220 | 100 |
Last value | 80 | 70 | 125 | 100 |
Sum of column less last value | 150 | 180 | 95 | - |
Dev factor | 1.6667 | 1.2222 | 1.0526 |
We often notate the development factors $f_{1,2}$ and $f_{2,3}$ etc.
We should note the relationship $f_{1,3} = f_{1,2} \times f_{2,3}$ etc and specifically:
$f_{1,n} = f_{1,2} \times f_{2,3} \times f_{3,4}... \times f_{n-1,n}$ and
$f_{2,n} = f_{2,3} \times f_{3,4}... \times f_{n-1,n}$ and so on
And so we can easily continue to finish off the run-off triangle:
Accident Year | 1 | 2 | 3 | 4 | reserve |
---|---|---|---|---|---|
2013 | 50 | 80 | 95 | 100 | - |
2014 | 60 | 100 | 125 | 132 | 7 |
2015 | 40 | 70 | 86 | 90 | 20 |
2016 | 80 | 133 | 163 | 172 | 92 |
IBNR | 118 |
For each accident year the reserve to be held is the projection to the end of the triangle MINUS the LAST piece of "hard" data for that year. In the case of IBNR - this last piece of data is the last year for which we actually have the accident reports.
The total IBNR (or whatever reserve we are calculating) is then the sum of these values for ech accident year
Issues to consider when looking at development triangles are:
Many issues we come across are similar to issues around handling data
There are many other methods which are variations on a theme of the chain ladder method:
Adjust historic claims values to bring them into line with up to date claims handling practices
Chain ladder is in fact a special case of curve fitting in which we fit the development factors exactly. More generally we could find a curve which was a close approximation to the actual development factors to be fitted
Similar to curve fitting in that data can be cut into different cohorts and then any key features and trends can be analysed before recompiling back into a set of development factors or more general relationship between different development years
A helpful paper by Julian Lowe; A practical Guide To Measuring Reserve Variability Using: Bootstrapping, Operational Time and a Distribution Free Approach can be found here
Full password protected spreadsheet
Bootstrapping involves using the historic claims data to simulate the uncertainty in the IBNR claims:
From our basic data
Cumulative | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2013 | 50 | 80 | 95 | 100 |
2014 | 60 | 100 | 125 | |
2015 | 40 | 70 | ||
2016 | 80 | |||
dev factor | 1.6667 | 1.2222 | 1.0526 |
Smoothed | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2013 | 46.64 | 77.73 | 95.00 | 100 |
2014 | 61.36 | 102.27 | 125 | |
2015 | 42.00 | 70 | ||
2016 | 80 |
Original | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2013 | 50 | 30 | 15 | 5 |
2014 | 60 | 40 | 25 | |
2015 | 40 | 30 | ||
2016 | 80 |
Smoothed | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2013 | 46.64 | 31.09 | 17.27 | 5.00 |
2014 | 61.36 | 40.91 | 22.73 | |
2015 | 42.00 | 28.00 | ||
2016 | 80.0 |
Residuals | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2013 | 3.36 | -1.09 | -2.27 | - |
2014 | -1.36 | -0.91 | 2.27 | |
2015 | -2.00 | 2.00 | ||
2016 | - |
Re-arranged | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2013 | -2.27 | -0.91 | -1.36 | 2.27 |
2014 | 3.36 | -2.00 | -1.09 | |
2015 | -1.36 | - | ||
2016 | 2.27 |
Smoothed + Residuals | 1 | 2 | 3 | 4 |
---|---|---|---|---|
2013 | 44.36 | 30.18 | 15.91 | 7.27 |
2014 | 64.73 | 38.91 | 21.64 | |
2015 | 40.64 | 28.00 | ||
2016 | 82.27 |
Reserve = 110.7 (take my word for it)
Bootstrapping is best understood by actually doing it in a spreadsheet (as per e-lecture)
The other method for calculating the smoothed triangle forms the smoothed incremental triangle from multiplying $\alpha$ and $\beta$ together
We can see below that $\beta_1$ is set to be 1.00
Smoothed (Incremental) | ||||
---|---|---|---|---|
$1.00$ | $\beta_2$ | $\beta_3$ | $\beta_4$ | |
$\alpha_1$ | $\alpha_1 \times 1.00$ | $\alpha_1 \times \beta_2$ | $\alpha_1 \times \beta_3$ | $\alpha_1 \times \beta_4$ |
$\alpha_2$ | $\alpha_2 \times 1.00$ | $\alpha_2 \times \beta_2$ | $\alpha_2 \times \beta_3$ | $\alpha_2 \times \beta_4$ |
$\alpha_3$ | $\alpha_3 \times 1.00$ | $\alpha_3 \times \beta_2$ | $\alpha_3 \times \beta_3$ | $\alpha_3 \times \beta_4$ |
$\alpha_4$ | $\alpha_4 \times 1.00$ | $\alpha_4 \times \beta_2$ | $\alpha_4 \times \beta_3$ | $\alpha_4 \times \beta_4$ |
Denoting the cumulative claims for accident year a, development year d as: $C_{a,d}$
For a 4 cell triangle we have that:
$\alpha_1 = \frac{C_{1,4}}{f_{1,2} \times f_{2,3} \times f_{3,4}}$
and $\alpha_2 = \frac{C_{2,3}}{f_{1,2} \times f_{2,3} }$ etc
Now the $\beta$s which are a little harder
$\alpha_1 \times \beta_2 = \frac{C_{1,4}}{f_{2,3} \times f_{3,4}} - \alpha_1$ because this is an incremental value
$\therefore \alpha_1 \times \beta_2 = \alpha_1 \times f_{1,2} - \alpha_1 = \alpha_1 \times (f_{1,2} - 1)$
$\therefore \beta_2 = f_{1,2} - 1$
For $\beta_3$:
We have that
$\alpha_1 \times \beta_3 = \frac{C_{1,4}}{ f_{3,4}} - \alpha_1 - \alpha_1 \times \beta_2$ again because this is an incremental value
$\therefore \alpha_1 \times \beta_3 = \alpha_1 \times f_{1,2} \times f_{2,3} - \alpha_1 - \alpha_1 \times \beta_2$
$\therefore \beta_3 = f_{1,2} \times f_{2,3} -1 - \beta_2$
$\therefore \beta_3 = f_{1,2} \times f_{2,3} - f_{1,2}$ or $ \beta_3 = f_{1,3} - f_{1,2}$
From which we see the general formula for $\beta$ which is:
$ \beta_n = f_{1,n} - f_{1,n-1}$
The over-dispersed Poisson model is a variant on standard bootstrapping in which instead of taking the difference between the actual values and the fitted values we take the ratio of the actual to fitted values
These ratios are then randomly arranged around the run-off triangle and then multiplied back into the fitted values to produce a randomised run-off triangle
As with standard bootstrapping this process is then repeated thousands of times to produce a distribution of returns